ggplot2: extensionslibrary(tidyverse)
While ggplot2 comes with a lot of batteries included, the extension ecosystem provides priceless additinal features
Here we use the patchwork, but note that cowplot is also a popular alternative.
We start by creating 3 separate plots
data("msleep", package = "ggplot2")
p1 <- ggplot(msleep) +
geom_boxplot(aes(x = sleep_total, y = vore, fill = vore))
p1
p2 <- ggplot(msleep) +
geom_bar(aes(y = vore, fill = vore))
p2
p3 <- ggplot(msleep) +
geom_point(aes(x = bodywt, y = sleep_total, colour = vore)) +
scale_x_log10()
p3
Combining them with patchwork is a breeze using the different operators
library(patchwork)
p1 + p2 + p3
p_all <- (p1 | p2) / p3
p_all
p_all + plot_layout(guides = 'collect')
p_all & theme(legend.position = 'none') ## new operator, operates on all plots
p_all <- p_all & theme(legend.position = 'none')
p_all + plot_annotation(
title = 'Mammalian sleep patterns',
tag_levels = 'A'
)
Patchwork will assign the same amount of space to each plot by default, but this can be controlled with the
widthsandheightsargument inplot_layout(). This can take a numeric vector giving their relative sizes (e.g.c(2, 1)will make the first plot twice as big as the second). Modify the code below so that the middle plot takes up half of the total space:
p <- ggplot(mtcars) +
geom_point(aes(x = disp, y = mpg))
p + p + p
p + p + p + plot_layout(widths = c(1, 2, 1))
The
&operator can be used with any type of ggplot2 object, not just themes.Modify the code below so the two plots share the same y-axis (same limits)
p1 <- ggplot(mtcars[mtcars$gear == 3,]) +
geom_point(aes(x = disp, y = mpg))
p2 <- ggplot(mtcars[mtcars$gear == 4,]) +
geom_point(aes(x = disp, y = mpg))
p1 + p2
p1 + p2 & coord_cartesian(ylim = c(0, 40))
Patchwork contains many features for fine tuning the layout and annotation. Very complex layouts can be obtained by providing a design specification to the
designargument inplot_layout(). The design can be defined as a textual representation of the cells. Use the layout given below. How should the textual representation be understood?
p1 <- ggplot(mtcars) +
geom_point(aes(x = disp, y = mpg))
p2 <- ggplot(mtcars) +
geom_bar(aes(x = factor(gear)))
p3 <- ggplot(mtcars) +
geom_boxplot(aes(x = factor(gear), y = mpg))
layout <- '
AA#
#BB
C##
'
p1 + p2 + p3 + plot_layout(design = layout)
layout <- '
11#
112
332
##2
'
p1 + p2 + p3 + plot_layout(design = layout)
ggplot2 is usually focused on static plots, but gganimate extends the API and grammar to describe animations. As such it feels like a very natural extension of using ggplot2.
ggplot(economics) +
geom_line(aes(x = date, y = unemploy))
library(gganimate)
ggplot(economics) +
geom_line(aes(x = date, y = unemploy)) +
transition_reveal(along = date)
There are many different transitions that control how data is interpreted for animation, as well as a range of other animation specific features.
ggplot(mpg) +
geom_bar(aes(x = factor(cyl)))
ggplot(mpg) +
geom_bar(aes(x = factor(cyl))) +
labs(title = 'Number of cars in {closest_state} by number of cylinders') +
transition_states(states = year) +
enter_grow() +
exit_fade()
The animation below will animate between points showing cars with different cylinders.
ggplot(mpg) +
geom_point(aes(x = displ, y = hwy)) +
ggtitle("Cars with {closest_state} cylinders") + ## string interpolation with glue
transition_states(factor(cyl))
gganimate uses the
groupaesthetic to match observations between states. By default the group aesthetic is set to the same value, so observations are matched by their position (first row of 4 cyl is matched to first row of 5 cyl etc.). This is clearly wrong here (why?). Add a mapping to thegroupaesthetic to ensure that points do not move between the different states.
ggplot(mpg) +
geom_point(aes(x = displ, y = hwy, group = factor(cyl))) +
ggtitle("Cars with {closest_state} cylinders") + ## string interpolation with glue
transition_states(factor(cyl))
In the presence of discrete aesthetic mappings (
colourbelow), the group is deduced if not given. The default behaviour of objects that appear and disappear during the animation is to simply pop in and out of existance.enter_*()andexit_*()functions can be used to control this behaviour. Experiment with the different enter and exit functions provided by gganimate below. What happens if you add multiple enter or exit functions to the same animation?
ggplot(mpg) +
geom_point(aes(x = displ, y = hwy, color = factor(cyl))) +
ggtitle("Cars with {closest_state} cylinders") +
transition_states(factor(cyl)) +
enter_fade() +
exit_shrink()
In the animation below (as in all the other animations) the changes happens at constant speed. How values change during an animation is called easing and can be controlled using the ease_aes() function. Read the documentation for ease_aes() and experiment with different easings in the animation.
mpg2 <- tidyr::pivot_longer(mpg, c(cty,hwy))
ggplot(mpg2) +
geom_point(aes(x = displ, y = value)) +
ggtitle("{if (closest_state == 'cty') 'Efficiency in city' else 'Efficiency on highway'}") +
transition_states(name) +
ease_aes("bounce-in-out")
ggplot(mpg2) +
geom_point(aes(x = displ, y = value)) +
ggtitle("{if (closest_state == 'cty') 'Efficiency in city' else 'Efficiency on highway'}") +
transition_states(name) +
ease_aes("elastic-out")
Text is a huge part of storytelling with your visualisation. Historically, textual annotations has not been the best part of ggplot2 but new extensions make up for that.
Standard
geom_textwill often result in overlaping labels
ggplot(mtcars, aes(x = disp, y = mpg)) +
geom_point() +
geom_text(aes(label = row.names(mtcars)))
ggrepel takes care of that
library(ggrepel)
ggplot(mtcars, aes(x = disp, y = mpg)) +
geom_point() +
geom_text_repel(aes(label = row.names(mtcars)))
If you want to highlight certain parts of your data and describe it, the
geom_mark_*()family of geoms have your back
library(ggforce)
ggplot(mtcars, aes(x = disp, y = mpg)) +
geom_point() +
geom_mark_ellipse(aes(filter = gear == 4,
label = '4 gear cars',
description = 'Cars with fewer gears tend to both have higher yield and lower displacement'))
ggrepelhas a ton of settings for controlling how text labels move. Often, though, the most effective is simply to not label everything. There are two strategies for that: Either only use a subset of the data for the repel layer, or setting the label to""for those you don’t want to plot. Try both in the plot below where you only label 10 random points.
mtcars2 <- mtcars
mtcars2$label <- rownames(mtcars2)
points_to_label <- sample(nrow(mtcars), 10)
ggplot(mtcars2, aes(x = disp, y = mpg)) +
geom_point() +
geom_text_repel(data = mtcars2[points_to_label, ],aes(label = label))
mtcars2$label[-points_to_label] <- ""
ggplot(mtcars2, aes(x = disp, y = mpg)) +
geom_point() +
geom_text_repel(aes(label = label))
Explore the documentation for
geom_text_repel. Find a way to ensure that the labels in the plot below only repels in the vertical direction
ggplot(mtcars2, aes(x = disp, y = mpg)) +
geom_point() +
geom_text_repel(aes(label = label), direction = "y")
ggforce comes with 4 different types of mark geoms. Try them all out in the code below:
ggplot(mtcars, aes(x = disp, y = mpg)) +
geom_point() +
geom_mark_ellipse(aes(filter = gear == 4, label = '4 gear cars'))
ggplot(mtcars, aes(x = disp, y = mpg)) +
geom_point() +
geom_mark_circle(aes(filter = gear == 4, label = '4 gear cars'))
ggplot(mtcars, aes(x = disp, y = mpg)) +
geom_point() +
geom_mark_hull(aes(filter = gear == 4, label = '4 gear cars'), concavity = 10)
ggplot(mtcars, aes(x = disp, y = mpg)) +
geom_point() +
geom_mark_rect(aes(filter = gear == 4, label = '4 gear cars'))
In the future, the ggtext will make styling text with markdown and css like syntax relatively easy.
ggplot2 has been focused on tabular data. Network data in any shape and form is handled by ggraph.
library(ggraph)
library(tidygraph)
graph <- create_notable('zachary') %>%
mutate(clique = as.factor(group_infomap()))
graph
# A tbl_graph: 34 nodes and 78 edges
#
# An undirected simple graph with 1 component
#
# Node Data: 34 x 1 (active)
clique
<fct>
1 2
2 2
3 2
4 2
5 3
6 3
# … with 28 more rows
#
# Edge Data: 78 x 2
from to
<int> <int>
1 1 2
2 1 3
3 1 4
# … with 75 more rows
ggraph(graph) +
geom_mark_hull(aes(x, y, fill = clique)) +
geom_edge_link() +
geom_node_point(size = 2)
Using `stress` as default layout
dendrograms are just a specific type of network
iris_clust <- hclust(dist(iris[, 1:4]))
ggraph(iris_clust) +
geom_edge_bend() +
geom_node_point(aes(filter = leaf))
Using `dendrogram` as default layout
Most network plots are defined by a layout algorithm, which takes the network structure and calculate a position for each node. The layout algorithm is global and set in the
ggraph(). The defaultautolayout will inspect the network object and try to choose a sensible layout for it (e.g. dendrogram for a hierarchical clustering as above). There is, however no optimal layout and it is often a good idea to try out different layouts. Try out different layouts in the graph below. See the the website for an overview of the different layouts.
ggraph(graph, layout = "kk") +
geom_edge_link() +
geom_node_point(aes(colour = clique), size = 3)
ggraph(graph, layout = "eigen") +
geom_edge_link() +
geom_node_point(aes(colour = clique), size = 3)
ggraph(graph, layout = "backbone") +
geom_edge_link() +
geom_node_point(aes(colour = clique), size = 3)
There are many different ways to draw edges. Try to use
geom_edge_parallel()in the graph below to show the presence of multiple edges
highschool_gr <- as_tbl_graph(highschool)
highschool_gr
# A tbl_graph: 70 nodes and 506 edges
#
# A directed multigraph with 1 component
#
# Node Data: 70 x 1 (active)
name
<chr>
1 1
2 2
3 3
4 4
5 5
6 6
# … with 64 more rows
#
# Edge Data: 506 x 3
from to year
<int> <int> <dbl>
1 1 13 1957
2 1 14 1957
3 1 20 1957
# … with 503 more rows
ggraph(highschool_gr) +
geom_edge_parallel() +
geom_node_point()
Using `stress` as default layout
ggraph(highschool_gr) +
geom_edge_fan() +
geom_node_point()
Using `stress` as default layout
Faceting works in ggraph as it does in ggplot2, but you must choose to facet by either nodes or edges. Modify the graph below to facet the edges by the year variable (using facet_edges())
ggraph(highschool_gr) +
geom_edge_fan() +
geom_node_point() +
facet_edges(~year)
Using `stress` as default layout
Many people have already desgned beautiful (and horrible) themes for you. Use them as a base.
p <- ggplot(mtcars, aes(mpg, wt)) +
geom_point(aes(color = factor(carb))) +
labs(
x = 'Fuel efficiency (mpg)',
y = 'Weight (tons)',
title = 'Seminal ggplot2 example',
subtitle = 'A plot to show off different themes',
caption = 'Source: It’s mtcars — everyone uses it'
)
library(hrbrthemes)
p + scale_colour_ipsum() +
theme_ipsum()
library(ggthemes)
p + scale_colour_excel() +
theme_excel()
states <- c(
'eaten', "eaten but said you didn\'t", 'cat took it', 'for tonight',
'will decompose slowly'
)
pie <- data.frame(
state = factor(states, levels = states),
amount = c(4, 3, 1, 1.5, 6),
stringsAsFactors = FALSE
)
ggplot(pie) +
geom_col(aes(x = 0, y = amount, fill = state))
ggplot(pie) +
geom_col(aes(x = 0, y = amount, fill = state)) +
coord_polar(theta = 'y')
ggplot(pie) +
geom_col(aes(x = 0, y = amount, fill = state)) +
coord_polar(theta = 'y') +
scale_fill_tableau(name = NULL, guide = guide_legend(ncol = 2)) +
theme_void() +
theme(legend.position = 'top')
ggplot(pie) +
geom_arc_bar(aes(x0 = 0, y0 = 0, r0 = 0, r = 1, amount = amount, fill = state), stat = 'pie') +
coord_fixed()
ggplot(pie) +
geom_arc_bar(aes(x0 = 0, y0 = 0, r0 = 0, r = 1, amount = amount, fill = state), stat = 'pie') +
coord_fixed() +
scale_fill_tableau(name = NULL,
guide = guide_legend(ncol = 2)) +
theme_void() +
theme(legend.position = 'top',
legend.justification = 'left')